On the Convergence of Optimistic Policy Iteration

نویسنده

John N. Tsitsiklis

چکیده

We consider a finite-state Markov decision problem and establish the convergence of a special case of optimistic policy iteration that involves Monte Carlo estimation of Q-values, in conjunction with greedy policy selection. We provide convergence results for a number of algorithmic variations, including one that involves temporal difference learning (bootstrapping) instead of Monte Carlo estimation. We also indicate some extensions that either fail or are unlikely to go through.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On The Convergence Of Modified Noor Iteration For Nearly Lipschitzian Maps In Real Banach Spaces

In this paper, we obtained the convergence of modified Noor iterative scheme for nearly Lipschitzian maps in real Banach spaces. Our results contribute to the literature in this area of re- search.

متن کامل

Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications

We consider a class of generalized dynamic programming models based on weighted sup-norm contractions. We provide an analysis that parallels the one available for discounted MDP and for generalized models based on unweighted sup-norm contractions. In particular, we discuss the main properties and associated algorithms of these models, including value iteration, policy iteration, and their optim...

متن کامل

On the Ishikawa iteration process in CAT(0) spaces

In this paper, several $Delta$ and strong convergence theorems are established for the Ishikawa iterations for nonexpansive mappings in the framework of CAT(0) spaces. Our results extend and improve the corresponding results

متن کامل

Approximate Policy Iteration: A Survey and Some New Methods

We consider the classical policy iteration method of dynamic programming (DP), where approximations and simulation are used to deal with the curse of dimensionality. We survey a number of issues: convergence and rate of convergence of approximate policy evaluation methods, singularity and susceptibility to simulation noise of policy evaluation, exploration issues, constrained and enhanced polic...

متن کامل

Convergence of the multistage variational iteration method for solving a general system of ordinary differential equations

In this paper, the multistage variational iteration method is implemented to solve a general form of the system of first-order differential equations. The convergence of the proposed method is given. To illustrate the proposed method, it is applied to a model for HIV infection of CD4+ T cells and the numerical results are compared with those of a recently proposed method.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Journal of Machine Learning Research

دوره 3 شماره

صفحات -

تاریخ انتشار 2002

On the Convergence of Optimistic Policy Iteration

نویسنده

چکیده

منابع مشابه

On The Convergence Of Modified Noor Iteration For Nearly Lipschitzian Maps In Real Banach Spaces

Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications

On the Ishikawa iteration process in CAT(0) spaces

Approximate Policy Iteration: A Survey and Some New Methods

Convergence of the multistage variational iteration method for solving a general system of ordinary differential equations

عنوان ژورنال:

اشتراک گذاری